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Abstract 

This paper introduces CLEO, a novel preference elicitation algorithm capable 
of recommending complex objects in hybrid domains, characterized by both 
discrete and continuous attributes and constraints dehned over them. The al¬ 
gorithm assumes minimal initial information, i.e., a set of catalog attributes, 
and dehnes decisional features as logic formulae combining Boolean and al¬ 
gebraic constraints over the attributes. The (unknown) utility of the decision 
maker (DM) is modelled as a weighted combination of features. CLEO iter¬ 
atively alternates a preference elicitation step, where pairs of candidate solu¬ 
tions are selected based on the current utility model, and a rehnement step 
where the utility is rehned by incorporating the feedback received. The elic¬ 
itation step leverages a Max-SMT solver to return optimal hybrid solutions 
according to the current utility model. The rehnement step is implemented 
as learning to rank, and a sparsifying norm is used to favour the selection of 
few informative features in the combinatorial space of candidate decisional 
features. 

CLEO is the hrst preference elicitation algorithm capable of dealing with 
hybrid domains, thanks to the use of Max-SMT technology, while retaining 
uncertainty in the DM utility and noisy feedback. In so doing it adapts 
the recently introduced learning modulo theory framework to the preference 
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elicitation setting. The combinatorial formulation of the utility function 
coupled with the feature selection capabilities of 1-norm regularization allow 
to effectively deal with the uncertainty in the DM utility while retaining high 
expressiveness. Experimental results on complex recommendation tasks show 
the ability of CLEO to quickly focus towards optimal solutions, as well as 
its capacity to recover from suboptimal initial choices. While no competitors 
exist in the hybrid setting, CLEO outperforms a state-of-the-art Bayesian 
preference elicitation algorithm when applied to a purely discrete task. 

Keywords: preference elicitation, learning while optimizing, (Maximum) 
Satishability Modulo Theory, hybrid optimization. 


1. Introduction 

Automatically discovering the solution preferred by a decision maker 
(DM) from a large set of candidate ones is a key component of many sys¬ 
tems, including decision-support, recommendation algorithms and personal 
agents. This task is usually referred to as the preference elicitation prob¬ 
lem [1]. In principle, one may hrst ask the user to express her preferences 
and then translate them into a utility function dehned over the search space 
of candidate solutions. The conhguration maximizing the utility function is 
recommended to the DM. However, this approach is impractical, for several 
reasons |2]: 

• the user cannot usually dehne her preferences a priori, without seeing 
any tentative results. Only when facing candidate solutions, she may 
realize “what is possible” and articulate her actual objectives; 

• the cognitive effort and the time required to the user for completely 
specifying preferences are usually not affordable; 

• in general, formalizing the user preferences as a mathematical model is 
not trivial: a model should capture the qualitative notion of preference 
and represent it as a quantitative function. 

To handle the initial incomplete knowledge of the user utility, an incremen¬ 
tal approach is usually adopted, where a conhguration is recommended to 
the user based on partial preference information only. If the user is not 
satished by the tentative solution, she is asked for additional preference in¬ 
formation and a rehned conhguration is suggested. This incremental process 
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needs techniques that can reason with partially-specihed utility functions 
and take decisions under uncertain preference information. Furthermore, the 
interaction with human decision makers, with limited patience and bounded 
rationality, limits both the number and the complexity of the queries asked 
during the elicitation process, bounds the time needed for providing the 
recommendations and has to deal with inaccurate and inconsistent human 
feedback. 

The main requirements for practical applicability of preference elicitation 
are |3]: 

1. real-time interaction with the DM, where both the query generation 
and the solutions recommendation must be accomplished in no more 
than few seconds; 

2. robustness to inconsistent and contradictory feedback from the DM 
characterizing the typical human decision making process; 

3. cognitively affordable queries to the user, i.e., comparison queries; 

4. scalable methods, that evaluate at each preference elicitation stage a 
number of candidate queries that grows not more than linearly in the 
cardinality of the solutions space. 

Different approaches to preference elicitation have been proposed. Usu¬ 
ally, a parametric formulation of the space of possible DM utility functions 
is adopted. A set of basis functions are dehned on subsets of the attributes, 
and the utility model is formulated as a weighted linear combination of these 
basis functions. 

Approaches to preference elicitation can be classihed by the way they 
make recommendations under uncertainty in the weight values. Uncertainty 
in DM utility can be represented for instance by dehning a space of feasible 
weights, identihed by bounds or constraints on the values. These constraints 
are learned from the preference information elicited from the DM. This pop¬ 
ular approach, known in the literature as reasoning under strict uncertainty, 
is adopted in IHEIIS]- In these papers, decisions under uncertainty are taken 
according to the minimax regret criterion-, the conhguration minimizing the 
worst-case loss with respect to the feasible utility functions is recommended. 

The Bayesian approach [3 El Eli maintains a probability distribution 
over the space of all possible weight values. Decisions are taken according 
to this probability distribution: the recommended solution is usually the one 
with greatest expected utility. 
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Recent work in the field of constraint programming uni formalizes the 
user preferences in terms of soft constraints. In soft constraints, a generaliza¬ 
tion of hard constraints, each assignment to the variables of one constraint 
is associated with a preference value. The work in |T 0 ] introduces a prefer¬ 
ence elicitation strategy for soft constraint problems with missing preference 
values. 

However, neither the works H El E] based on the minimax regret nor 
the constraint-based approach [ 10 ] can handle inaccurate and contradictory 
human feedback. The Bayesian method proposed in [3| satisfies all main re¬ 
quirements for practical applicability discussed above. However, it can handle 
discrete attributes only, and it is hardly generalizable to the continuous case. 

This paper introduces a novel algorithm which satisfies all the main prin¬ 
ciples for practical applicability of preference elicitation, allows to deal with 
hybrid domains and when applied to purely boolean problems consistently 
improves the state-of-the-art in terms of number of queries and quality of the 
returned solution. The approach adopts a combinatorial formulation of the 
user utility function, modelled as a weighted combination of first-order logic 
formulae. Each formula combines predicates in a certain theory of interest 
by using the logical connectives. The theory fixes the interpretation of the 
symbols used in the predicates (e.g., the theory of arithmetic for dealing with 
integer or real numbers). For example, consider the case of flight selection. 
The predicate (pi = {Ai -|- H 2 < 5 hours) defines the preference for a travel 
duration, calculated as flight duration (continuous attribute Hi) plus transfer 
time to the departure airport (H 2 ), smaller than five hours. The predicate 
ip 2 = (H 3 < 2 ) states the desirability for a flight with a number of stopovers 
(discrete attribute H 3 ) smaller than two. The DM preferences about the 
candidate flights are expressed by associating the two predicates (pi and ip 2 
with weights wi and W 2 , respectivelj^ The flight maximizing the sum of the 
weights of the satisfied predicates is the one preferred by the DM. 

The configuration maximizing the weighted combinations of the first- 
order logic formulae is identified by applying a Maximum Satisfiability Mod¬ 
ulo Theory (Max-SMT) solver [TT]. Max-SMT is a powerful recent formalism 
to optimize weighted formulae in a decidable first-order theory. Max-SMT 


^In this simple example, each formula consists of a single predicate only. In the gen¬ 
eral case, arbitrary logic formulae (e.g., conjunctions or disjunctions of possibly negated 
predicates) are considered. 
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enables to describe candidate solutions of the preference elicitation task by 
using both discrete and continuous attributes simultaneously {hybrid search 
domain), thus improving the state-of-the-art of preference elicitation, which 
cannot handle hybrid search domains. Furthermore, Max-SMT enables to 
manage complex non-linear interactions among the attributes (for example, 
a cost attribute dehned as a function of the remaining attributes), increasing 
the expressiveness. Learning modulo theories was recently introduced na as 
a framework for adapting structured-output learning to hybrid domains by 
leveraging Max-SMT technology. This paper adapts the framework to deal 
with preference elicitation tasks. 

The approach presented in this paper assumes a very limited amount 
of prior information about the task to be solved. The initial knowledge is 
limited to a set of catalog attributes used to describe the candidate solutions. 
The combinatorial formulation of the DM utility over the catalog attributes 
is initially unknown and needs to be learned by interacting with DM. For 
this purpose, our approach consists of an iterative algorithm, alternating a 
preference elicitation step guided by the currently learned utility function 
and a rehnement step where the quality of the utility function is improved 
according to the feedback received. In the preference elicitation step, two 
candidate conhgurations are selected according to the current utility and 
presented to the DM for comparison. The rehnement step consists of solving 
a ranking problem which outputs a rehned utility function consistent with the 
feedback received (soft consistency is allowed to deal with noisy feedback). 
The feature space of the utility function is given by all possible hrst-order 
logic formulae combining the predicates up to a certain degree. Only a small 
fraction of these candidate features is actually part of the unknown utility for 
a certain DM [T3]. A sparsifying norm [TT] is used during training in order to 
favour utility functions with few non-zero weights, thus performing constraint 
selection in the combinatorial space of candidate features. In the rest of this 
paper the algorithm is referred to by the acronym CLEO, which stands for 
unknown Combinatorial utility function joint LEarning and Otimization. 

An experimental evaluation on realistic problems dehned over hybrid do¬ 
mains (i.e., with both discrete and continuous decisional attributes) and with 
inaccurate human feedback demonstrates the ehectiveness of CLEO in focus¬ 
ing towards the optimal solutions, its robustness to noisy learning signals and 
its ability to recover from suboptimal initial choices. While no competitors 
exist in the general case of hybrid domains, we provide an experimental 
comparison on the simplihed task of learning purely Boolean combinatorial 
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functions. Thanks to its ability to learn complex non-linear interactions be¬ 
tween attributes, CLEO outperforms a state-of-the-art Bayesian preference 
elicitation approach [3]. 

A preliminary version of CLEO was presented in |T^. This manuscript 
extends it in a number of directions. First, it replaces quantitative judgments 
asked to the DM with less cognitive demanding queries, consisting of pairwise 
preferences of candidate solutions. Second, it considerably extends the ex¬ 
perimental evaluation, including a more realistic recommendation problem. 
Third, it provides a deeper comparison with the preference elicitation litera¬ 
ture, and adds an experimental comparison with a state-of-the-art preference 
elicitation technique. 

The organization of the paper is as follows. Section introduces the 
terminology and the notation used in the paper, focusing in particular on 
the Max-SMT formalism. A small introductory example of the preference 
elicitation tasks follows (Sec.[^. The CLEO algorithm is introduced in Sec.|^ 
and some of its main properties are analyzed in Sec. Related work is 
discussed in Sec. while Section reports the experimental evaluation. 
Finally, a discussion including potential future research directions concludes 
the paper. 

2. Notation and background 

This section provides the necessary background to introduce the CLEO 
algorithm. The Satishability Modulo Theory (SMT) formalism for solving 
decision problems over hybrid domains is explained, followed by its general¬ 
ization (Max-SMT) to handle optimization tasks. Table summarizes the 
notation used throughout the paper. 
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Symbol 

Meaning 

T, T 

Boolean values true and false 

x,y,z,... 

Rational variables 

to 

Catalog attributes (Boolean or rational variables) 

A 

Conhguration (assignment of values to all catalog 
attributes) 

A* 

i-th configuration 


Constraints. They can be atomic (Boolean attributes 
or predicates over rational attributes, e.g. x + y < 3) 
or the combination of atomic constraints by the logical 
connectives (e.g. -ihas^car — )■ dist^supermarket < 6) 

Ifc(A) 

Indicator function for constraint (pk over A. 

It evaluates to one if ipk is satished, to zero otherwise. 


Feature (i.e., constraint) representation of 
configuration A 

MA) = Ifc(A) 

Feature associated to constraint pk 

w 

Weights 


Table 1: Explanation of the notation used throughout the text. 


2.1. Satisfiability Modulo Theory 

Propositional logic considers formulae involving Boolean variables and 
logical connectives. The satisfiability (SAT) problem consists of deciding 
whether a formula in propositional logic can be satisfied by a truth value as¬ 
signment of the Boolean variables. Satisfiability Modulo Theory (SMT) [T6| 
[H] extends SAT to decide about satisfiability of a first-order formula with 
respect to a background theory T, like linear arithmetic over the rationals 
{CTZA) or integers {CXA), or a combination of theories. First-order logic 
involves variables, functions and predicates; the theory T hxes the interpre¬ 
tation of predicate and function symbols. For example, given the following 
SMT formula from the theory of arithmetic over integers: 

X + y + z < 4, x,y, z e {1,2,3} 

we are interested in deciding whether there is an assignment of integer values 
to the variables x, y and z satisfying the formula. In this paper, SMT(T) 
indicates satishability modulo theory T, e.g., SMT(£7^M) for satishability 
modulo linear arithmetic over the rationals. 
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Current SMT solvers are based on the so-called lazy approach, where 
an outer SAT-solver interacts with one or more specialized T-solvers (one 
for each theory) in order to progressively focus the search towards theory- 
consistent solutions or to state the unsatishability of the input SMT formula. 
A T-solver is a specialized reasoning method for the theory T integrated as 
submodule in the SMT solver. Usually a T-solver is a decision procedure 
developed to check the satishability of conjunctions of literals (i.e., atomic 
formulae and their negations) over theory T. The generalization to arbitrary 
propositional structures is handled in conjunction with the SAT solver in¬ 
tegrated in the SMT solver. For ease of exposition, here a single theory is 
assumed, but all the machinery described can be applied to arbitrary com¬ 
binations of theories. 

Let A be an SMT formula made of predicates in a certain theory T. 
Its Boolean abstraction A“ is obtained replacing each i-th theory-specihc 
predicate in A with a Boolean variable (pi, producing a formula in plain 
propositional logic. If this propositional formula in unsatishable, the original 
formula A is also unsatishable and the whole SMT solver stops. Otherwise, 
the SAT solver hnds a truth value assignment to the Boolean variables pi 
satisfying A“, and presents it to the T-solver to check for theory consistency. 
The T-solver searches for an assignment of values to the theory variables 
which is consistent with the solution provided by the SAT solver: if the 
Boolean variable pi is assigned a value true (false), the corresponding i- 
th predicate must (not) be satished by the values assigned to the theory 
variables. Predicates are evaluated using the rules of the theory T. If the 
T-solver detects an inconsistency, it returns unsat, plus a justification, i.e. 
a subset of the truth value assignment provided by the SAT solver which is 
unsatishable according to the theory. The justihcation is an explanation of 
the inconsistency detected. This justihcation is added to the original formula, 
and the process is repeated until a theory-consistent solution is found, or the 
rehned formula is not satishable. 

Example 2.1. Let A be the following SMT(CIA) formula: 

x + y + z<3A{x<y\/z = 2)A{x>2\/x^z) 
where x, y, z are integer-valued variables. Its Boolean abstraction A“ is: 

Pi A (p2 V ps) A (p4 V ps). 
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Suppose the SAT solver finds the following truth assignment satisfying A : 

= T, 992 = T, 993 = T, 994 = T, 995 = T. 

It corresponds to the following SMT(CXA) formula: 

x + y + z<3Ax<yAz = 2Ax>2Ax^z. 

When asked to evaluate this formula, the T-solver detects that it is theory 
inconsistent, since if z is set to 2 and both x and y must be larger than 2, the 
sum of the three variables cannot be less than or egual to 3. A justification 
provided by the the T-solver to explain the inconsistency may be, e.g., the 
following constraint: 

-•(991 A 992 A 993 A 994) 

which is included in A~ for the following calls to the SAT solver. A possible 
solution provided by the SAT solver for the refined Boolean abstraction: 

991 A (992 V 993) A (994 V 995) A -1(991 A 992 A 993 A 994) 

is the following truth assignment: 

= T, 992 = -L, 993 = T, 994 = _L, 995 = T, 

corresponding to the theory formula: 

x-\-y-\-z<3Ax>yAz = 2Ax<2AxyIz. 

The T-solver detects that this formula is theory consistent. It is satisfied, 
e.g., by the assignment: 

X = l,y = 0, z = 2. 

The search process of the overall SMT solver now stops, since a solution of 
the input formula A has been found. 

These solvers are termed lazy because of this incremental approach which 
generates constraints on demand, progressively rehning the Boolean abstrac¬ 
tion A“ by including additional theory-specihc information. 

Modern lazy SMT solvers introduce a number of rehnements to this ba¬ 
sic procedure, by pursuing a tighter integration between SAT and theory 
solvers. A common approach consists of pruning the search space for the 
SAT solver by calling the theory solver on partial assignments and propa¬ 
gating its results. Furthermore, modern lazy SMT solvers combine solving 
techniques from very heterogeneous domains. We refer the reader to HU OK] 
for an overview on lazy SMT solving. 
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2.2. Max-SMT 

Max-SMT [T9l[20l|2T] generalizes SMT in the same way as Max-SAT does 
with SAT: rather than an assignment satisfying the inpnt SMT formnla, one 
maximizing the nnmber of satished constraints is searched for. The weighted 
version of Max-SMT associates a (typically positive) weight to each con¬ 
straint, and the task is that of maximizing the weighted snm of the satished 
constraints. 

Let {((pi, Wi),..., {<Pm, Wm)} be a set of constraints with associated non¬ 
negative weights. The ntility of any assignment is clearly smaller or eqnal 
than the snm of all weights W = larger than or eqnal to 

zero. The maximnm-utility solntion is identihed by a branch and bound 
strategy, which progressively tightens the upper and lower utility bounds 
and solves plain SMT problems encoding these bounds in their formula¬ 
tion. Given a lower bound W < IT, a solution is enforced to have a utility 
larger than W by generating a set of m fresh Boolean variables and weights 
{((pi, u}!),..., ((pm, Wm)} Combined with the following constraints [T^ : 

(pi y (pi Vf G {1,2,... ,m} 

(Pi-)■ (Wj = 0) Vf e (1,2,... ,m} 

^ {wi = Wi) Vi e (1,2,... ,m} 

m 

'^Wi>W 

i=\ 

These constraints make any assignment with overall weight smaller than W 
inconsistent with the theory. 

3. An introductory example 

Le us consider a customer that aims at building her own house. For this 
purpose, she asks a real-estate company about potential housing locations. 
A very clear-headed person could formulate a request like: 

/ would like a house in a safe area, close to my parents and to 
the kindergarten, with a garden if there are no parks nearby. I 
would also like to live close to cycling and walking facilities. Of 
course, to fully enjoy these outdoor activities, the area should not 
he affected by air pollution. Finally, I prefer a site well served by 
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public transport, with the nearest metro station easily reachable 
on foot. My maximum budget is 300,000 Euro. 

These desiderata can be encoded as an SMT problem as follows: 

solve: 

V </ 92 ) A </93 A 934 A </95 A 936 A </97 A </98 A V 99 

snbject to: 


931 = A2 

932 = Ai 

^3 = (A 3 < ^i) 

934 = (A 4 < 62) 

9^5 = (A 5 < 63) 

9^6 = Aq 

ifij = (Aj < 64) 

^8 = (As < O5) 

^9 = {Ag < 6 q) 


price{A) < 300000 



where the characteristics of the locations are defined by the set of catalog 
attributes A listed in Table Function price computes the price of location 
A based on the values of its attributes. 


name 

description 

type 

Ai 

garden 

Bool 

A 2 

park nearby 

Bool 

A 3 

crime rate 

Ordinal 

A 4 

distance from parents 

Real 

A 5 

distance from kindergarten 

Real 

Ae 

cycling and walking facilities in the neighborhood 

Bool 

A 7 

air-pollution index 

Ordinal 

As 

public-transit service quality-index 

Ordinal 

Ag 

distance from nearest metro station 

Real 

Aio 

commercial facilities in the neighborhood 

Bool 

All 

distance from downtown 

Real 


Table 2 : Catalog attributes for the housing example. 


If none of the locations available at the agency satishes all constraints, 
the above problem has no solution. A more reasonable alternative consists 
of solving the optimization version of the above problem, which maximizes 
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the weighted sum of the satished constraints (i.e., a Max-SMT problem): 


5 

argmax Wi 

A i=i 

subject to: 

= (^3 < ^l) 

if 2 = (“'^2 V ^l) 

= ((^4 < O2) A (A5 < ^3)) 

fi = [Aq a (Aj < 6 ^ 4 )) 

^5 = ((^8 < ^ 5 ) A (Ag < ^e)) 
price{A) < 300000 
ipi^{wi = Wi) Vz e {1,2,... ,5} 
^iPi^{wi = Q) Vz e (1,2,... ,5} 


where each constraint ipi is associated to a weight Wi quantifying the (relative) 
utility of the constraint. The bound on the price is a hard constraint that 
needs to be satished, thus it has no weight. 

A fully specihed scenario like the one described here is however not re¬ 
alistic when a human DM is involved. An exact specihcation of the set of 
relevant constraints is hard to obtain, let alone their respective weights. The 
most natural scenario consists of an interactive process, with the customer 
evaluating candidate locations and the realtor updating her understanding 
of the customer preferences according to the feedback received. The rest of 
this paper introduces the CLEO algorithm, a preference elicitation method 
that automatizes this process. 

Let us hnally note that not all the catalog attributes describing candi¬ 
date house locations may be relevant for a customer: in the above example 
the customer decides without considering the last two attributes in Table 
A large list of catalog attributes enables both a hne-grained description of 
the locations and the interaction with different classes of customers, having 
different decisional items. On the other hand, users are expected to take 
decisions based on a limited set of attributes in the large catalogue. The 
CLEO algorithm can identify the subset of catalog attributes relevant for a 
certain customer. 
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4. The CLEO algorithm 


This section introduces the CLEO algorithm, hrst describing its compo¬ 
nents and then combining them into the overall algorithm. 

Catalog attributes. CLEO assumes a catalogue of attributes which can 
be used to describe the conhgurations. Each conhguration is an instantia¬ 
tion of the catalog attributes. These attributes can be either Boolean (e.g., 
there is a garden), ordinal (e.g., crime rate) or real (e.g., distance to 
kindergarten) variables (see Tablein the previous example for a list). A 
large number of attributes can be included, in order to increase the expres¬ 
siveness of the method and enable hne-grained descriptions of the conhgura¬ 
tions. However, only a limited subset of the attributes may be relevant for 
a specihc decision maker, and, in general, the subset varies when different 
users are considered. This section will show how CLEO identihes the subset 
of relevant attributes. 

Hard constraints. Some combinations of attribute values may be infea¬ 
sible. For example, in the above housing example, house locations with 
cost value smaller than a given threshold may not be available. Arbitrarily- 
complex hard constraints dehne the feasible search space of candidate con¬ 
hgurations. The hard constraints are assumed to be known in advance. The 
CLEO algorithm provides to the DM only feasible solutions during the pref¬ 
erence elicitation process. 

Soft constraints. Soft constraints are dehned over the catalog attributes. 
A soft constraint may or may not be satished by a feasible conhguration. 
Each soft constraint is associated with a weight, dehning the utility value of 
the constraint. Positive weights are associated with constraints expressing 
positive preferences of the DM (i.e., features that the preferred conhguration 
should have), while negative weights are associated with constraints articulat¬ 
ing negative preferences (i.e., features that the ideal conhguration should not 
have). The absolute value of the weight dehnes how much the soft constraint 
is relevant for the DM (w.r.t. to the other soft constraints). A zero-weight 
identihes a constraint not considered by the DM. 

Space of soft constraints. Soft constraints are atomic constraints or their 
combination. Atomic constraints are constructed from catalog attributes, 
by simply taking their values for Boolean variables, and constraining each 
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ordinal and real variable to be below a certain (variable-specific) thresh¬ 
old. In the case of non-Boolean variables, atomic constraints are thus pred¬ 
icates in first-order logic. More complex constraints can be constructed by 
arbitrary combinations of these building blocks. For example, distance 
to kindergarten < 6 A distance to parents < 6, so that a car is not 
needed, or, house with garden V distance from nearest park < 6, so 
that open-air activities are possible. 

These combinations are arbitrary logic formulae (e.g, conjunctions or dis¬ 
junctions) of up to d atomic constraints. The maximal degree d contributes 
to limit the size of the soft constraints space, and is grounded on the bounded 
rationality of humans, who can simultaneously handle only a limited number 
of features. 

The space of constructible soft constraints is clearly exponential in the 
size of the catalogue. In the following we show how CLEO manages the large 
dimensionality of the soft constraints space. For this purpose, let us define 
here the mapping function 'ij^{A) which projects configuration A into the 
space of all possible soft constraints, i.e., combinations of up to d atomic 
constraints. Each soft constraint (fk is associated with its indicator function 
Ia:(A) which evaluates to one if the constraint is satisfied and to zero other¬ 
wise. The feature (i.e., constraint) representation of configuration A is the 
vector obtained by concatenating the evaluation of each indicator function: 

In the following, the vector returned by function xjj and the space of all 
possible vectors returned by function i/? will be referred to as feature vector 
and feature space, respectively. The terms feature and constraint will thus 
be used interchangeably. 

Combinatorial utility function. The DM utility function is represented 
by a subset of the soft constraints defined over the catalog attributes. The 
soft constraints involved in the definition of the utility function are associated 
a weight different from zero and encode the DM preferences. The utility of 
a configuration is the sum of the weights of the soft constraints satisfied by 
the configuration. 

The above introduced feature vector 'll) {A) enables the following compact 
formulation of utility function /: 

f{A) = 'w'^'i/){A) (1) 
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where the weight vector w contains the weights associated with the candidate 
soft constraints. Due their bounded rationality and limited information¬ 
processing capabilities, humans can handle only a limited number of features 
to make decisions. Thus only very few of the candidate soft constraints will 
actually be considered by the DM, resulting in an extremely sparse weight 
vector w. This sparsity assumption will be accounted for when introducing 
the learning stage. 

Learning phase. Learning amounts to hnd the weights for the utility func¬ 
tion formulation in Eq. matching the unknown DM preferences. Training 
examples for this phase consist of candidate conhgurations with their evalu¬ 
ation from the DM. Asking quantitative feedback such as real-valued scores 
is typically not affordable for a human DM |3]. A more realistic scenario 
consists of asking the DM to rank solutions by preference. We can thus 
formulate the problem as learning to rank, where the task is learning a func¬ 
tion returning the same ranking as the one provided by the DM. We focus 
on the adaptation of SVM for ranking 1221 , which assumes pairwise ranking 
preferences, and enforces a (soft) large margin between the two predictions. 
However, we have an additional requirement, which is the sparsity assump¬ 
tion in the weight vector w. Indeed, the feature vector contains all possible 
constraints (up to a certain complexity), and the learning phase should also 
perform some form of constraint learning by selecting a small set of relevant 
ones. Feature selection is in fact crucial to maximize the learning accuracy 
with data sets characterized by redundant and irrelevant features [23]. We 
favour feature selection by replacing the 2-norm of SVM with a 1-norm, which 
is a sparsifying norm encouraging solutions with few non-zero weights [23] • 
The resulting learning problem is: 


min 

w.^>o 

subject to: 


A\A^ 

m^(V>(A*) - tA(A^)) > 1 - 
y A^y A^ eV 


( 2 ) 


where A* >- A^ indicates that conhguration A* is ranked before A^ in the 
DM preference. Constraints enforce pairwise rankings to match DM prefer¬ 
ences. A quadratic penalty ffj is added to the objective function when a less 
preferred solution gets a utility score which is not sufficiently smaller than 
the more preferred one. The regularization parameter C trades-off matching 
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DM preferences with sparsity of the weight vector, and is optimized during 
the learning process as discussed further down. 

Optimization phase. The ultimate goal of the algorithm is returning the 
best possible instance given the DM utility function. However, since the util¬ 
ity function is unknown, a preference elicitation phase is needed to gather 
information on DM preference and use it to rehne the current approximation 
/ of her utility. CLEO asks the DM for pairwise comparisons of conhgura- 
tions. The two conhgurations to be compared by the DM are generated by 
optimizing the learned utility function f{A) twice. Since the learned util¬ 
ity function is a weighted combination of soft constraints involving Boolean 
variables and first-order logic predicates dehned over discrete and continuous 
variables, it is optimized by using an off-the-shelf Max-SMT solver, which 
can efficiently reason in these hybrid domains. The two optimization runs 
are performed based on the following principles: 

1 . the generation of top-quality configurations, consistent with the learned 
DM preferences; 

2 . the generation of diversified configurations, i.e., alternative possibly 
suboptimal configurations with respect to the learned utility /; 

3. the search for catalog attributes relevant to the DM not recovered by 
the current approximation /, i.e., attributes not appearing in any of 
the soft constraints in /. 

The rationale for the first principle is focusing on the relevant areas of 
the utility surface, those of interest to the DM. As a matter of fact, a pref¬ 
erence elicitation system that asks to rank low quality configurations will be 
likely considered useless or annoying by the DM [3]. In addition, the goal 
of CLEO is the identification of the solution preferred by the user [learning 
to optimize) rather than an accurate global approximation of the DM utility 
function [learning per se). This requires a shift of paradigm with respect to 
standard machine learning strategies, in order to model the relevant areas of 
the optimization fitness surface rather than reconstruct it entirely. 

The second principle advocates the introduction of some diversification 
in the search, by exploring the neighbourhood of the best solution for the 
currently learned preference model /. Finally, as the learned formulation of 
/ may miss some of the user decisional attributes, their search is explicitly 
promoted by the third principle. The need for a set of good and diverse 
configurations to be evaluated by the user is suggested also in ra¬ 
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Our optimization phase works as follows. First, / is maximized (first 
principle), generating the hrst candidate conhguration A*. Then, a hard 
constraint is added to the Max-SMT problem as the disjunction of all soft 
constraints not satished by A*, and maximization is run again. This accounts 
for the second principle, by enforcing a new solution A** which differs from 
A* by at least one soft-constraint. If A* satishes all soft constraints in /, 
the additional hard constraint generated is: V ... V which 

excludes A* from the set of feasible solutions. 

Finally, each unassigned attribute, i.e., catalog attribute not appearing 
in any hard constraint or soft constraint with non-zero weight, in both A* 
and A** is given a random value in its domain, thus incorporating the third 
principle. Indeed, if these catalog attributes are truly irrelevant for the DM, 
setting them at random should not affect the evaluation of the candidate 
solutions. On the other hand, if some of them are needed to explain the DM 
preferences, driving their elicitation can allow to identify the dehciencies 
of the current approximation / and recover previously discarded relevant 
decisional items. 

Overall algorithm. The pseudocode of the full CLEO algorithm is shown 
in Algorithm It takes as input the set of catalog attributes, the set of 
atomic constraints, the set of hard constraints dehning the feasible conhgu- 
rations, and returns the solution which is most preferred by the DM. In the 
initialization phase, the DM is asked for two pairwise comparisons of conhg- 
urations selected by CLEO independently and uniformly at random in the 
feasible search space. Then a rehnement loop begins, where at each iteration 
hrst an approximation of the DM utility function is learned using the current 
feedback. The rehnement amounts at solving the “learning to rank” problem 
in Eq. ([^, where T) is the dataset of all pairwise preferences collected so 
far. The regularization parameter C is set to one in the hrst iteration, and 
hne-tuned by an internal cross validation on the training set in the following 
ones. With a slight abuse of notation, we write / argmax to indicate that 
/ is the function whose weights w are the result of the maximization. The 
conhguration A* maximizing the learned utility function / is recommended 
to the DM. If she is not satished with the suggested solution, an additional 
optimizer A** of / is generated, favouring diversity between A* and A** 
based on the diversihcation strategy dehned above. The dataset V is then 
updated by including the comparison between A* and A** performed by the 
DM. 
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Data: Set of catalog attributes, set of atomic constraints, set of hard 
constraints 

Result: Most preferred solution A* 

/* Initialization 

1 Select three configurations uniformly at random 

2 T> ranking of configurations by DM 
/* Refinement 

3 while true do 

I /* learning 


4 


*/ 


*/ 

*/ 


argmin ||m||i + C 
w,^>o 


E 

AV A^ 


> 2 ,' 


s.t. - ■0(A-^')) > 1 - 

V A* ^ A^' e D 


/* optimization 

Recommend configuration A* = argmax/ to the DM 
if termination criterion is not satisfied then 
/* preference elicitation 
Generate A** by diversification strategy 
V V U ranking of pair (A*, A**) by DM 
end 


*/ 


*/ 


10 end 


Algorithm 1: Pseudocode for CLEO. 


Being an interactive process involving a human DM, the most obvious 
termination condition is the DM satisfaction with the current recommenda¬ 
tion. Additional conditions could be conceived, for instance, by estimating 
the improvement one could expect by further rehning the utility function. 
We will discuss this and other potential extensions in the conclusions. 

5. CLEO properties 

The CLEO algorithm has no free parameters to be manually tuned. The 
number of iterations does not need to be hxed at the beginning. The DM may 
ask for an additional iteration by comparing the recommended conhguration 
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A* with her own preferences. The termination criterion is thus represented 
by the satisfaction of the DM with A*. The regularization parameter C in 
Eq. (|^ is set to one in the hrst iteration, and hne-tuned by internal cross- 
validation on the training set in the following ones. In the hrst iteration, two 
pairwise comparisons are asked to the DM, while in the following iterations 
a single pairwise comparison is asked. The conhgurations to be compared at 
the hrst iteration are generated by sampling independently and uniformly at 
random the feasible search space. The evaluation of diverse examples stim¬ 
ulates the preference expression, especially when the user is still uncertain 
about her hnal preference |21]. In particular, the diversity of the proposed 
solutions helps the user to reveal the hidden preferences: in many cases the 
decision maker is not aware of all preferences until she sees them violated. 
For example, a user does not usually think about the preference for an inter¬ 
mediate airport until a solution suggests an airplane change in a place she 
dislikes [ 2 i] . 

The human cognitive capabilities bound the number of catalog attributes 
and the size d of soft constraints. The limited size of the Max-SMT instances 
generated by CLEO enables the systematic investigation of the search space 
by means of a complete solver, which ensures the identihcation of a global 
maximum A* of the learned utility model / (completeness property). How¬ 
ever, CLEO cannot guarantee the quality of the model / approximating the 
true DM utilities, and therefore the optimality of A* (or bounds on its qual¬ 
ity) w.r.t. the true DM utilities cannot be proved. As a matter of fact, 
the learning task in Eq. (|^ is convex, and thus guaranteed to converge to 
its global optimum, but the consistency of the learning algorithm with the 
true underlying user utility is only guaranteed asymptotically (i.e., provided 
that enough training data is available). On the other hand, CLEO does not 
need to learn the exact form of the DM utility function. The goal of our 
approach is indeed to elicit as few preference information from the DM as 
possible in order to identify her favourite solution {learning to optimize). For 
example, consider the toy DM utility function represented by the negation 
of a single ternary term: -^{^pi A (p 2 A (^ 3 ). The approximation of the DM 
utility function consisting of the formula -^(pi is sufficient to hnd one of the 
favourite DM solutions. More in general, only the shape of the utility func¬ 
tion locally guiding the search to the correct direction is actually needed. 
Indeed the experimental results reported in Sec. show the ability of CLEO 
in identifying the optimal solution and the improvements in the quality of 
the candidate solutions when increasing the number of rehnement iterations 


19 


(anytime property). 

Finally, CLEO satisfies the main reqnirements for practical applicability 
of preference elicitation. In detail: 

1. multi-attribute models. Candidate configurations are described by mul¬ 
tiple decisional attributes. Since these attributes usually vary with 
different decision makers, CLEO assumes a set of catalog attributes, 
from which the decisional items of a specific DM are automatically se¬ 
lected. Unlike the state-of-the-art methods (see Sec. for preference 
elicitation, CLEO can handle both discrete and continuous-valued at¬ 
tributes simultaneously, thanks to the Max-SMT formalism which can 
efficiently tackle hybrid domains; 

2. real-time interaction with the DM. Due to the limited number n of 
catalog attributes and to the bounded size d of soft constraints, the 
learning phase (problem ([^) is accomplished in a negligible amount 
of time (w.r.t. the user response time). An analogous observation 
holds for the computational effort required by the optimization phase. 
Proposing a query consists of generating two candidates to be com¬ 
pared. Each candidate is obtained by a run of the complete Max-SMT 
solver. The bounded value of n and the efficient performance of mod¬ 
ern SMT solvers, that can efficiently manage problems with thousands 
of variables and millions of constraints, enable the completion of the 
optimization phase in a negligible amount of time; 

3. robustness to inconsistent and contradictory human feedback. The 
adoption of regularized machine learning strategies in CLEO enables 
a robust approach that can handle inaccurate (pairwise) comparisons 
of solutions from the DM. Assuming that a user always provides ac¬ 
curate and consistent preference information is not realistic. Different 
factors may generate uncertain and inconsistent feedback from the DM, 
including occasional inattention, embarrassment when comparing very 
similar solutions or solutions which are very different from her favourite 
one, DM fatigue increasing with the number of queries answered; 

4. user cognitive load. CLEO asks the user just for pairwise comparisons 
of candidate solutions. Most users are typically more conhdent in com¬ 
paring solutions, providing qualitative judgments like “I prefer solution 
A* to solution A** ”, rather than in specifying how much they prefer 
A* over A**; 

5. scalability. At each preference elicitation stage, just one candidate 
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query is considered by CLEO, independently of the cardinality of the 
conhguration space. The adoption of 1-norm regularization for the 
formulation of the learning problem requires that the input catalog 
attributes are explicitly projected in the feature space, i.e., the space 
of all possible soft constraints. Dealing with the explicit projection <h 
in Eq. Q is tractable only for a rather limited number of catalog at¬ 
tributes and size of constraints d. However, this will typically be the 
case when interacting with a human DM. Research in psychology has 
indeed shown that humans cannot handle simultaneously more than 
few (7 ± 2) factors [T^ . 


6. Related work 


The problem of automatically learning utility functions and eliciting pref¬ 
erences is widely studied within the Artihcial Intelligence community [251 126] . 
Different approaches have been proposed to take decisions with partial pref¬ 
erence information during the elicitation process. The uncertainty in the 
utility function is usually represented by a set of feasible utility functions 
(reasoning under strict uncertainty) [H El E], which is narrowed down when 
additional preference information is elicited, or by a probability distribu¬ 
tion over possible utility functions (Bayesian approach) [3 El [H [9] , rehned 
when additional knowledge of the DM preferences is obtained. Finally, a 
recent line of research developed within the Constraint Programming 
community shares with CLEO the combinatorial formulation of the DM util¬ 
ity function (constraint-based preference elicitation). In the following, these 
approaches to preference elicitation are reviewed and compared with CLEO. 
We also motivate the choice of the Bayesian method introduced by Guo and 
Sanner in [3] as benchmarking algorithm in our experiments and summa¬ 
rize its main features. A more detailed description and discussion about the 
state-of-the-art methods for preference elicitation can be found in [Appendix 
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6.1. Strict uncertainty 

A popular approach to model the uncertain knowledge about the DM 
preferences consists of assuming a set of hypotheses, with no belief on their 
strength. The set of hypotheses contains the feasible utility functions and 
reflects the partial knowledge about the DM preferences. The uncertainty 
about the DM preferences is decreased by restricting the feasible hypothesis 
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set, when relevant preference information is received during the elicitation 
process. This approach is often referred to as reasoning under strict uncer¬ 
tainty [25]. 

The minimax regret criterion [27] from statistical decision theory provides 
a way to make decisions under uncertainty. Given a certain decision A, the 
maximum regret is the difference in utility between the DM most preferred 
solution A* and A assuming the worst-case scenario, where the DM utility is 
the one in the feasible set for which this difference is maximal. By adopting 
the minimax regret criterion, the decision that minimizes this regret is taken. 
This criterion therefore suggests a robust decision w.r.t. the worst possible 
case. The recent work in jHElE] introduces an approach to preference elici¬ 
tation based on the minimax regret criterion. Queries to be asked to the DM 
are selected so as to reduce the minimax regret by restricting the feasible 
hypothesis set. An advantage of minimax regret approaches with respect to 
our formulation is that they can provide theoretical guarantees in terms of 
bounds on the solution quality and convergence to provably-optimal results. 
On the other hand, these approaches assume perfect feedback from the DM 
and cannot handle the imprecise and contradictory information which is typ¬ 
ical of interactions with human DM. Therefore, they are not suitable for the 
realistic preference elicitation tasks considered in this work. 

6.2. Bayesian uncertainty 

An alternative uncertainty model (Bayesian approaches) consists of dehn- 
ing a probability distribution (or belief) over the candidate utility functions jT] 
|3llHl|9|. The probabilistic framework offers a flexible approach to preference 
elicitation, handling the uncertainty in both utility and DM feedback. The 
expected utility of a conhguration is dehned as the average utility computed 
with respect to the probability distribution over the utility functions. The 
configuration maximizing the expected utility is usually recommended to the 
user. Therefore, under the Bayesian paradigm, robust decisions are taken 
to minimize risk in expectation. Queries are asked to the DM in order to 
increase the posterior probability of her utility. The probabilistic framework 
enables to estimate the informativeness of the candidate queries. At each 
stage of the preference elicitation process the maximally informative query 
is asked. The maximum expected loss (MEL) of taking a decision A is the 
maximum expected reduction in utility when choosing A instead of the DM 
most preferred solution A*, where expectation is taken over the probability 
distribution of the utility functions. The value ofinformation (VOI) criterion 


22 


suggests the query generating the largest expected reduction in MEL. Exact 
computation of VOI, as well as exact computation of the posterior distribu¬ 
tion over utility functions given the feedback, are extremely expensive. The 
state-of-the-art approaches mi resort to approximate solutions. 

The closest approach to CLEO is the Bayesian method introduced by 
Guo and Sanner in [5] (referred to as GSM). Indeed, unlike the techniques 
based on minimax regret and on the constraint satisfaction formalism, GLEO 
and GSM satisfy all the main principles [3] needed for practical applicability 
of preference elicitation (see Sec. [^. 

The GSM algorithm [3] searches for the conhguration preferred by the 
DM within a given set of candidates. The conhgurations are described by 
n discrete attributes xi,... ,Xn, where the k-th attribute is assigned values 
from a hnite set Xk with cardinality \Xk\. The user utility functions are 
represented by a weight vector w with dimension Yl'k=i \^k\ specifying the 
utility of each attribute value in for each attribute k. This modelling 
choice assumes preferential independence among the set of attributes. 

The uncertainty about the user preferences is represented by considering 
the weight vector w as a multivariate continuous random variable and by 
maintaining a probability distribution Pr(w), which is incrementally rehned. 
Different strategies are dehned to select the query to be asked at each rehne- 
ment stage. Since GSM asks pairwise comparisons to the DM, in principle 
the VOI of each possible pairwise comparison has to be estimated. This query 
strategy, termed informed VOI, thus scales quadratically with the number 
of conhgurations and its computational cost is affordable for small search 
spaces only. In the experiments reported in [3], already 20 conhgurations 
prevent its application, even if the probabilities of the two possible answers 
to a pairwise comparison are assigned hxed arbitrary values [uninformed 
VOI strategy) rather than the values estimated from the elicited preference 
information. The computational load can be decreased by restricting the set 
of candidate pairwise comparisons, e.g., by hxing one element of each can¬ 
didate pair to the conhguration x* with greatest expected utility [restricted 
informed VOI strategy). For scalability purposes, the authors also suggest 
an alternative query strategy which does not use the VOI criterion to rank a 
set of candidate comparisons. At each preference elicitation just one query is 
considered, namely the comparison between the conhguration x* with great¬ 
est expected utility and the solution x®^ maximizing the expected loss of 
recommending x* instead of x®^ [simplified VOI strategy). 

Unlike GLEO, GSM is conceived for instances characterized by purely 
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discrete attributes, and cannot tackle preference elicitation tasks over hybrid 
domains. In our experiments (Sec. [^, an empirical comparison of CLEO 
w.r.t. GSM is thus performed over a simplihed experimental setting involving 
discrete decisional attributes only. 

6.3. Constraint-based preference elicitation 

The work in [10] articulates the user preferences in terms of soft con¬ 
straints and introduces constraint optimization problems where the DM pref¬ 
erences are not completely known before the solution process starts. In soft 
constraints each assignment to the variables of one constraint is associated 
with a preference value taken from a preference set. The preference value 
represents the level of desirability of the assignment to the variables of the 
constraint. As the preference score is associated to a partial assignment to 
the problem variables, it represents a local preference value. The desirability 
of a complete assignment is dehned by a global preference score, computed 
by applying a combination operator to the local preference values. A set 
of soft constraints generates an order (partial or total) over the complete 
assignments of the variables of the problem. Given two solutions of the 
problem, the preferred one is selected by computing their global preference 
levels. Preference elicitation strategies have been introduced [10] to deal with 
scenarios where preference information is partially unknown. Some of the lo¬ 
cal preference values attached to soft constraints are assumed to be missing, 
and the DM is asked for an explicit feedback on specihc assignments for 
these constraints, in terms of score values quantifying her preference for a 
certain assignment. In comparison to this approach based on the Gonstraint 
Programming formalism, GLEO assumes a much more limited amount of 
initial knowledge about the problem at hand. In m, decision variables, 
soft constraint topology and structure are assumed to be known in advance 
and the incomplete initial information consists of missing local preference 
values only. GLEO assumes complete ignorance about the structure of the 
constraints over the decisional variables of the user. The initial problem 
knowledge is limited to a set of catalog attributes. GLEO extracts the de¬ 
cisional items of the DM from the set of catalog attributes and learns the 
weighted constraints constructed from them modeling the DM preferences. 

Furthermore, the technique in [10] is based on local elicitation queries, 
with the hnal user asked to reveal her preferences about assignments for 
specihc soft constraints. Global preferences or bounds for global preferences 
associated to complete solutions of the problem are derived from the local 
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preference information. CLEO goes in the opposite direction: it asks the 
user to compare complete solutions and learns local utilities (i.e., the weights 
of the soft constraints of the logic formula) from global preference values. 
In many cases, recognizing appealing or unsatisfactory global solutions may 
be much easier than dehning local utility functions, associated to partial 
solutions. For example, while scheduling a set of activities, the evaluation 
of complete schedules may be more affordable than assessing how specihc 
ordering choices between couples of activities contribute to the global prefer¬ 
ence value. Furthermore the algorithm in [10] asks the DM for quantitative 
evaluations of partial solutions: she does not just rank couples of activities, 
she provides score values quantifying her preference for the partial activity 
rankings, a much more demanding task. Finally, the approach in [101 as¬ 
sumes consistent and accurate quantitative feedback from the DM. Under 
this assumption, the optimality of the recommended solution is guaranteed. 
However, this approach cannot be applied in our realistic experimental set¬ 
ting characterized by the noisy human feedback. 

7. Experimental results 

The following empirical evaluation demonstrates that CLEO can handle 
realistic preference elicitation tasks dehned over hybrid domains and with 
uncertain human feedback. No alternative algorithm capable of tackling these 
preference elicitation tasks is currently available (see Sec. [^. To overcome 
this limitation, our experimental work consists of two phases. First, CLEO 
is tested over a couple of realistic preference elicitation tasks with the above 
features. For this purpose, a benchmark of Max-SMT problems is dehned, 
involving both discrete and continuous decisional variables. In a second step, 
a set of simplihed synthetic problems with discrete decisional variables only is 
introduced, in order to compare CLEO with the existing preference elicitation 
algorithms. In particular, we consider Boolean decisional attributes only and 
generate a set of synthetic Maximum-Satishability (Max-SAT) benchmarks. 
In this simplihed setting, the benchmarking preference elicitation algorithm 
is the method by Guo and banner [3]. 

For the experiments performed, the mapping function i/? in CLEO projects 
conhgurations into the space of all possible conjunctions of up to three atomic 
constraints (i.e., d = 3). The next section describes the well-known noisy 
response model used in both Max-SMT and Max-SAT experiments for sim¬ 
ulating inaccurate and inconsistent feedback provided by the DM during the 
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preference elicitation process. 

7.1. Noisy response model for human feedback 

In the experiments the feedback from the user is assumed to be affected 
by the inaccuracies and inconsistencies. The user ranks conhgurations A 
based on a latent utility function f{A). In particular, configuration A* is 
preferred to conhguration A\ i.e., A^ y A\ if and only if /(A*) > f{An- 
However, each evaluation /* = /(A*) is corrupted by additive independent 
and identically distributed (HD) Gaussian noise e 

i resulting 

in a noisy utility value Vi = fi + Si. 

Under the assumption of independent and identically distributed Gaus¬ 
sian noise, the probability that the user prefers configuration A* to configu¬ 
ration A^ is defined as follows: 


F{A^yA^f,J^) = Fiy,>yff,J^) = 

P(/j + G > fj + £j) = P(g — £j > fj — fi) (3) 

The quantity 6 = Si — Sj is the difference of two IID Gaussian variables with 
zero-mean and variance crl^oise^ ^rid therefore follows the Gaussian distribution 
Ar(0, 2(T^ojgg). By computing the standardized variable = 5/{\/2anoise)^ 
Eq. (|^ can be rewritten as: 

= 1-4. (AGL') 

V V ^(^noise / 

where <I> is the cumulative distribution function of the standard normal dis¬ 
tribution. 

The above user response model, linking pairwise comparisons to a contin¬ 
uous latent utility function, has been widely used in the economic and psy¬ 
chological studies to describe the individual choice behaviour of humans [2H1 
12^ ISP] . It is known as the Thurstone-Mosteller or Probit model. In our 
experimental setting fixed to 10, to have noise values comparable 

with the latent utility values /(A). 

7.2. Realistic preference elicitation tasks over hybrid domains 

GLEG is tested over a benchmark of Max-SMT problems, formulating 
realistic preference elicitation tasks. The Max-SMT tool used for the exper¬ 
iments is the “Yices” solver [21] (version 1.0), which is publicly available at 
http: //yices. csl. sri . com/ (as of August 2015). Each point of the curves 
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depicting our results is the median value over 400 runs with different random 
seeds. 

Max-SMT is a recent research area. Even if existing results m indi¬ 
cate that Max-SMT solvers can efficiently address real-world problems, to 
the best of our knowledge no well-established publicly available Max-SMT 
benchmarks exist and preference elicitation tasks have not been encoded into 
Max-SMT instances yet. 

In this work, we modelled a scheduling problem as a Max-SMT instance, 
where the DM expresses her preferences about the candidate schedules of a 
set of jobs. In the spirit of real-world recommendation tasks, we also design 
a housing problem aimed at selecting a location for building a house. The 
formulation consists of both unknown soft constraints representing the user 
preferences and known hard constraints defining the feasible search space. 
The housing problem is challenging, due to complex non-linear relationships 
among decision variables. For example, the variable encoding the cost of 
the location is defined as a function of the remaining decision variables. 
The results obtained by CLEO over both the preference elicitation tasks 
are discussed below. 

7.2.1. Scheduling problem 

A set of five jobs must be scheduled over a given period of time. Each 
job has a fixed known duration, the atomic constraints define the overlap 
of two jobs or their non-concurrent execution. The user unknown utility 
function is generated by selecting uniformly at random weighted conjunctions 
of atomic constraints. The solution of the problem is a schedule assigning 
a starting date to each job and maximizing the utility, where the utility of 
the schedule is the sum of the weights of the satisfied constraints of the user 
utility function. The atomic soft constraints define temporal constraints by 
using the difference arithmetic theory. In detail, let Si and dj, with i = 1... 5, 
be the starting date and the duration of the i-th job, respectively. If s* is 
scheduled before Sj, the constraint expressing the overlap of the two jobs is 
Sj — Si < di, while their non-concurrent execution is encoded by Sj — Si> di. 
Let us note that there are 40 possible constraints for a set of 5 jobs. The 
maximum size of the soft constraints is assumed to be three. The weights of 
soft constraints are distributed uniformly at random in the range [1,100]. 

CLEO is tested over a benchmark of randomly generated utility func¬ 
tions according to the couple {number of decisional features, number of soft 
constraints). The decisional features are the atomic constraints appear- 
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queries # queries # queries # 


Figure 1: Performance of CLEO in solving the scheduling problem. The jz-axis reports 
the percentage utility loss, while the a;-axis contains the number of pairwise-comparison 
queries asked so far. The curve reports the median values observed over 400 runs of CLEO, 
while the shaded area denotes the range among the 25th and the 75th percentiles. Please 
note the different range of the ar-axis in the case of nine soft constraints. 

ing in the soft constraints. We generate functions for the following values: 
{(5, 3), (10, 6), (15, 9)}. Each DM utility has at least two soft constraints with 
a size of three. Let’s underline once more that utility functions with more 
that few factors or factors with many terms are unrealistic when considering 
human DM ra. 

Results of the experiments are shown in Figure [1} The y-axis reports 
the percentage utility loss measured in terms of deviation from the utility of 
the DM preferred solution, while the x-axis contains the number of pairwise 
comparisons asked so far. The curves report the median values observed over 
400 runs, while the shaded area depicts the interquartile range measuring the 
dispersion around the median. 

As expected, the learning problem becomes more challenging for an in¬ 
creasing number of soft constraints. However, results are promising, as a sub¬ 
stantial improvement in the quality of the recommended solution is achieved 
by CLEO when additional queries are asked to the DM (anytime property). 
Furthermore, CLEO identihes the DM preferred solution in all cases. In de¬ 
tail, with the realistic cases of three and hve soft constraints, less than 35 
pairwise comparisons are asked to the DM to identify her preferred solution. 
With 9 soft constraints, 64 pairwise comparisons are required on average to 
recommend the DM preferred solution. However, with 40 queries, a percent¬ 
age utility loss within 5.5% is obtained. The shaded area shows that CLEO 
identihes the DM preferred solution quite consistently when increasing the 
number of queries (the interquartile range is within 25% after 35 queries even 
in the case of nine soft constraints). 
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1.2.2. Housing problem 

We consider a customer planning to build her own house and judging 
potential housing locations provided by a real estate company (henceforth the 
housing problem). There are different locations available where the customer 
may potentially build her house. The locations are characterized by different 
housing values, prices, constraints about the design of the building (e.g., 
usually in the city center you cannot have a family house with a huge garden 
and pool), etc. The customer may formulate her judgments by considering a 
description of the housing locations based on a predefined set of parameters, 
including, e.g., crime rate, distance from downtown, location-based taxes and 
fees, public transit service quality, walking and cycling facilities, proximity 
to commercial facilities or green areas, etc. Many of these parameters may 
be uninformative, as they do not represent any decisional criterion for the 
customer. Furthermore, hard constraints defining the feasible locations may 
be specihed in advance, e.g., cost bounds stated by the user or building design 
requirements asserted by the company. 

In our experiments, the formulation of the housing problem is as follows. 
The set of catalog attributes is listed in Table A set of ten hard constraints 
(Table defining feasible housing locations and known in advance is consid¬ 
ered. The hard constraints are stated by the customer (e.g., cost bounds) or 
by the company (e.g, constraints about the distance of the available locations 
from user-dehned points of interest). Let us note that constraints 5, 6, 7 de- 
hne a linear bi-objective problem among distances from user-defined points of 
interest. Prices of potential housing locations are dehned as a function of the 
other attributes. For example, price increases if a semi-detached house rather 
than a flat is selected or in the case of green areas in the neighborhood. On 
the other side, e.g., when crime index of potential locations increases, price 
decreases. Soft constraints are represented by weighted conjunctions of both 
predicates in the linear arithmetic theory and Boolean variables, in the case 
of attributes number 2, 3,..., 6 in Table For example, one predicate may 
model the preference for a location with distance from nearest free parking 
smaller than a given threshold, while a Boolean variable encodes, e.g., the 
aspiration for houses with garage. 

We generated a set of 40 predicates, i.e., atomic constraints. The user 
unknown utility function is composed of soft constraints with two or three 
predicates, with at least one soft constraint with three predicates. The max¬ 
imum number of predicates in a soft constraint is assumed to be known. The 
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Table 3: Catalog attributes for the housing problem. 


num 

attribute 

type 

1 

house type 

ordinal 

2 

garden 

Boolean 

3 

garage 

Boolean 

4 

commercial facilities in the neighborhood 

Boolean 

5 

public green areas in the neighborhood 

Boolean 

6 

cycling and walking facilities in the neighborhood 

Boolean 

7 

distance from downtown 

numerical 

8 

crime rate 

numerical 

9 

location-based taxes and fees 

numerical 

10 

public transit service quality index 

numerical 

11 

distance from high schools 

numerical 

12 

distance from nearest free parking 

numerical 

13 

distance from working place 

numerical 

14 

distance from parents house 

numerical 

15 

price 

numerical 


weights of soft constraints are integer valnes selected nnifornily at random in 
the range [1,100]. 

Fig.i reports the resnlts over a benchmark of 400 randomly generated 
ntility fnnctions for each of the following instantiations of the conple {num¬ 
ber of decisional features, number of soft constraints): {(5, 3), (10, 6), (15, 9)}, 
where the decisional featnres are the predicates appearing in the soft con¬ 
straints. The promising resnlts observed for the schednling problem are con- 
hrmed, even thongh the honsing problem is mnch harder, dne to complex 
non-linear interactions among the decisional attribntes. When increasing 
the nnmber of qneries asked, the qnality of the solntion rapidly improves 
and CLEO identihes the DM preferred conhgnration in all the cases. On 
average, 22 and 69 qneries are needed by CLEO to converge to the DM pre¬ 
ferred solntion in the case of three and nine soft constraints, respectively. Let 
us note again that utility functions involving nine soft constraints are quite 
unrealistic and are considered here just for testing the scalability of CLEO. 

The dispersion of the performance values keeps decreasing when increas- 
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Table 4: Hard feasibility constraints for the housing problem. Parameters pi, i = 1... 13, 
are threshold values specihed by the user or by the sales personnel, depending on who 
states the hard constraint which they refer to. 


num hard constraint 

1 price < Pi 

2 location-based taxes and fees < p 2 => not pnblic green ares in the 
neighborhood and not pnblic transit service qnality index < ps 

3 commercial facilities in the neighborhood => not (garden and 
garage) 

4 crime rate < P 4 => distance from downtown > ps 

5 distance from working place -|- distance from parents honse > pg 

6 distance from working place -|- distance from high schools > pj 

7 distance from parents honse -|- distance from high schools > ps 

8 distance from nearest free parking < pg => not pnblic green areas 
in the neighborhood 

9 distance from parents honse < pio => distance from downtown > 
pii and crime rate > pi 2 

10 garden => honse type > pia 


ing the nnmber of queries asked, showing that CLEO recommends better 
quality solution more consistently. However, in the case of three soft con¬ 
straints, the interquartile range observed when CLEO converges is equal to 
70.8%. With 40 queries, the dispersion decreases down to 45.4%. These val¬ 
ues are rather large. A deeper investigation of CLEO results revealed that 
the observed data dispersion is heavily affected by some runs where the so¬ 
lution quality does not improve when asking additional feedback to the DM. 
In these runs CLEO cannot generate queries informative enough to recover 
from suboptimal initial choices. Smarter queries strategies could be studied 
in order to tackle these cases, as discussed in Sec. 

7.3. Experimental comparison with the state-of-the-art 

Since existing methods cannot handle the preference elicitation tasks over 
hybrid domains dehned in the previous section, for a comparison with the 
state-of-the-art we focus on Boolean attributes only. With this choice, the 
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Figure 2: Performance of CLEO while solving the housing problem. 


atomic constraints are just the Boolean attributes, and more complex soft 
constraints expressing the DM preferences are Boolean terms in plain propo¬ 
sitional logic. That is, each soft constraint is the conjunction of (up to 
three) Boolean attributes and the unknown DM utility function is a weighted 
Maximum Satisfiability (Max-SAT) instance consisting of the weighted com¬ 
bination of the Boolean terms. The benchmarking algorithm is the GSM 
method [3] described in Sec. 

A benchmark of random utility functions is generated for {number of 
Boolean attributes, number of terms) equal to {(5, 3), (10, 6), (15, 9)}. Each 
utility function has two constraints with maximum size (three). Constraint 
weights are integers selected uniformly at random in the interval [—100, 0) U 
( 0 , 100 ], 

All the query selection strategies suggested in [3] for the GSM method 
have been tested in our experimental setting. For each of the three test cases 
{(5, 3), (10, 6), (15, 9)}, we report here the results of the query strategy with 
best performance. However, with more than five attributes, the most sophis¬ 
ticated Bayesian query strategies proposed in |3] are too slow, as pointed out 
also by the authors themselves and empirically verified in our preliminary 
experiments. They have thus been included in the (5, 3) case only. Based 
on our results, the best query strategy are the “restricted informed value of 
information (VOI)” for the test case (5, 3) and the “simplihed VOI” for both 
remaining test cases. 

Fig-i reports the percentage utility loss of the recommended conhgu- 
ration w.r.t the DM preferred solution for an increasing number of pairwise 
comparisons asked so far. The curves report the median values observed over 
200 runs for CLEO (darker solid line or blue solid line if viewed in colour) 
and GSM (lighter dashed line or red dashed line if viewed in colour). The 
shaded areas depict the interquartile range measuring the dispersion around 
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Figure 3: Performance of the CLEO (darker solid line or blue solid line if viewed in colour) 
and GSM (lighter dashed line or red dashed line if viewed in colour) algorithms over the 
Boolean problems. The y-axis reports the percentage utility loss, while the x-axis contains 
the number of pairwise comparisons asked so far. The curves report the median values 
observed over 200 runs, while the shaded area denotes the range among the 25th and the 
75th percentiles of the observations. Best viewed in colour. 


the median. 

The search space of the simplest problem with hve Boolean attributes 
contains just 32 candidate conhgurations, thus any strategy asking more 
than few questions is not competitive with naive exhaustive search. On 
average, seven and nine queries are asked to the DM by CLEO and GSM 
for discovering her preferred solution. However, with 12 (or less) queries, the 
CLEO and GSM performance are statistically equivalent under a Two-sided 
Wilcoxon signed-rank test with a Bonferroni-corrected signihcance level of 
10“^. With more than 12 queries, there is statistical evidence for better 
results by CLEO, due to the much more unstable behavior of the GSM 
method: after 14 queries CLEO consistently identihes the DM preferred 
solution with a null interquartile range (IQR), while the IQR of the GSM 
results remains above 16.6%. 

The more challenging test cases are represented by the problems with 10 
and 15 Boolean attributes, where the search space size is 1024 and 32768, 
respectively, preventing the application of exhaustive search techniques. In 
both these cases, the performance of CLEO is much better than that of GSM. 

In detail, with 10 Boolean attributes, CLEO on average asks 25 pair¬ 
wise comparisons to the DM for identifying her favourite solution, while the 
average percentage utility loss of the conhguration recommended by GSM 
remains above 10% even if 50 queries are asked to the DM. With 16 queries, 
the CLEO curve is within 2%, against a value of around 19% observed for 
GSM. The performance difference between CLEO and GSM is signihcant at 
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10“^ level after eight queries, and the signihcant level goes to 10“^^ after 15 
queries. 

An analogous situation is observed for the (15, 9) test case. The solution 
returned by CLEO has an average loss of less than 2% after 26 queries and less 
than 1% after 38 ones. On the other hand, after 50 queries, GSM recommends 
on average solutions with a loss still above 22.3%. The performance difference 
after the hrst seven queries is statistically signihcant with a 10“^ level, which 
goes to 10“^° after ten queries. 

8. Conclusions 

This paper introduces CLEO, a preference elicitation algorithm that, un¬ 
like existing approaches, handles preference elicitation tasks dehned over hy¬ 
brid domains and with uncertain human feedback. A combinatorial formu¬ 
lation of the unknown DM utility function is adopted. CLEO consists of an 
incremental procedure, iteratively optimizing the learned approximation of 
DM utility function to generate candidate solutions and rehning the approx¬ 
imation based on the human feedback received. Simple pairwise comparison 
queries are asked to the DM. 

CLEO assumes very limited initial knowledge. In detail, since different 
decision makers usually have different decisional criteria, the algorithm just 
assumes a set of catalog attributes describing the candidate conhgurations. 
The DM preferences are expressed by soft constraints over the attributes 
values. However, only a small subset of catalog attributes (and, by conse¬ 
quence, of soft constraints dehned on them) may be relevant for a specihc 
DM, resulting in a sparse learning setting, both in the number of relevant 
attributes and soft constraints. The algorithm employs 1-norm regulariza¬ 
tion, which enforces sparsity of the learned function, in order to identify the 
relevant attributes and constraints. 

The learned function is a set of weighted soft constraints involving both 
discrete and continuous-valued attributes. The conhguration maximizing the 
weights of the satished constraints is recommended to the DM. To identify 
this conhguration, a Max-SMT solver is used. CLEO is a generic frame¬ 
work, enabling the adoption of well-assessed learning methods and Max-SMT 
solvers. 

Experimental results on realistic preference elicitation tasks demonstrate 
the ehectiveness of CLEO in focusing towards the optimal solutions, its ro¬ 
bustness, as well as its ability to recover from suboptimal initial choices. 
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Our experiments involve preference elicitation tasks over hybrid domains, 
with uncertain human feedback, (known) hard constraints limiting the set of 
feasible conhgurations and complex non-linear interactions among the deci¬ 
sional attributes (e.g., the cost attribute in the case of the housing problem). 
CLEO has also been compared with a state-of-the-art Bayesian preference 
elicitation approach in a simplihed setting with purely discrete attributes. 
The experimental results show that CLEO outperforms the benchmarking 
algorithm, with the performance difference becoming more pronounced when 
increasing the complexity of the preference elicitation task. 

CLEO can be generalized in a number of directions. The learning stage 
employs a ranking loss function based on pairwise preference evaluation. 
More complex ranking losses have been proposed in the literature (see for 
instance [32]), especially to increase the importance of correctly ranking the 
highest scoring solutions, and could be combined with 1-norm regularization. 

Active learning is a hot research area and a broad range of different ap¬ 
proaches has been proposed (see [33] for a review). The simplest and most 
common framework is that of uncertainty sampling-, the learner queries the 
instances on which it is least certain. However, the ultimate goal of a rec¬ 
ommendation or optimization system is selecting the best instance(s) rather 
than correctly modeling the underlying utility function. The query strat¬ 
egy should thus tend to suggest good candidate solutions and still learn as 
much as possible from the feedback received. Typical areas where research 
on this issue is quite popular are single- and multi-objective interactive op¬ 
timization [31] and information retrieval [35] . The need to trade-off multiple 
requirements in this active learning setting is addressed in [36] where the au¬ 
thors consider relevance, diversity and density in selecting candidates. Our 
future research will consider the application of these active learning tech¬ 
niques. The performance of our method indeed depends on the trade-off 
between the identihcation of candidates solutions satisfying the DM (i.e., so¬ 
lutions optimizing the current learned preference model) and the generation 
of informative training examples for the following rehnement of the learned 
model. 

In the context of preference elicitation, Bayesian approaches are attractive 
as they quantify the uncertainty in the learned DM utility models and provide 
a principled approach to estimate the value of the information obtained by 
asking a certain query to the DM. In particular, the value of the information 
estimates the extent to which a certain query helps in improving the quality 
of the learned preference model. The value of information is exploited to 
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design efficient qnery strategies consisting of informative qneries, see, e.g., the 
GSM |3] algorithm we nse as benchmark in the experimental comparisons. 
Adapting these concepts to onr setting, where the utility function is dehned 
over hybrid domains and models complex non-linear interactions between 
attributes, is highly non-trivial, as our comparisons suggest (see Section 7.3). 
This is an interesting and challenging direction for future research. 

Another research direction is the extension of our approach to handle 
feedback from multiple DMs BTl. In particular, an interesting case study is 
the exploitation of preferences of previous DMs to minimize the elicitation 
effort for a new user mi- We also plan to extend our algorithm to tackle 
preference drift |32], i-e., the tendency of the DM to change her preferences 
during the interactive utility elicitation process. In our combinatorial utility 
settings, the DM preference drift can be modelled by weights of soft con¬ 
straints evolving over time and by logic formulae gradually changing (e.g., 
the Boolean term xi A X 2 becoming xi A 0:2 A x^ when the DM realizes to have 
a more complex requirement). 

Finally, this paper focused on preference elicitation tasks, involving small- 
scale problems typical of an interaction with a human DM. From a more gen¬ 
eral perspective, CLEO provides a framework for the joint learning and op¬ 
timization of unknown combinatorial functions, involving both discrete and 
continuous decision variables. In principle, when combined with appropri¬ 
ate SMT solvers, CLEO could be applied to large combinatorial optimization 
problems (e.g., arising from industrial applications of combinatorial optimiza¬ 
tion [39]), whose formulation is only partially available. However, the cost of 
requiring an explicit representation of all possible combinations of predicates 
(even if limited to the unknown part) would rapidly produce an explosion of 
computational and memory requirements. An option consists of resorting to 
an implicit representation of the function to be optimized, like the kernel- 
ized one we used in [T5| when learning quantitative scores. As our previous 
results seem to indicate [15], this can produce a degradation in the quality 
of returned solutions when the utility function is very sparse. Kernelized 
versions of zero-norm regularization [IH] could be tried in order to enforce 
sparsity in the projected space if needed. Let us however note that the lack 
of an explicit formula would prevent the use of all the efficient rehnements of 
SMT solvers, based on a tight integration between SAT and theory solvers. 
A possible alternative is that of pursuing an incremental feature selection 
strategy and iteratively solving increasingly complex approximations of the 
underlying problem. 
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Appendix A. Additional discussion about the state-of-the-art of 
preference elicitation 

This section reviews two notable state-of-the-art approaches for pref¬ 
erence elicitation: the body of work adopting the Minimax regret crite¬ 
rion HEIE] and the more recent line of research im developed within the 
Constraint Programming community. In particular, the latter method shares 
with CLEO a constraint-based approach to preference elicitation, resulting in 
a combinatorial formulation of the DM preferences. However, both state-of- 
the-art methods are not thus suitable for the realistic recommendation tasks 
considered in our experimental setting, characterized by inaccurate and in¬ 
consistent human feedback. In the following, we review these alternative 
approaches in detail and compare them with CLEO. 
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Appendix A.l. Minimax regret-based approaches 

The methods developed in the papers [H ISj |6] perform preference elic¬ 
itation under strict uncertainty. They assume a parametric formulation of 
the candidate utility function (hypothesis) in the feasible utility set U. The 
parametrization enables a compact way to specify the feasible set, which 
is represented by bounds and constrains on the parameters. Uncertainty 
is thus reduced by tightening the constraints or increasing (decreasing) the 
lower (upper) bounds. 

To make decisions with the partial utility information under strict un¬ 
certainty and, in particular, to select the hnal conhguration to be returned 
to the DM, the minimax regret decision criterion is used. It prescribes the 
conhguration that minimizes the maximum regret with respect to all the pos¬ 
sible realizations of the DM utility function in the set U. Thus, the minimax 
regret criterion minimizes the worst-case loss with respect to the possible 
realizations of the DM utility function. In detail, the minimax regret crite¬ 
rion is dehned in two stages, building on the maximum pairwise regret and 
the maximum regret. The maximum pairwise regret of conhguration x with 
respect to conhguration x' over the feasible utility set U is dehned as: 

R(x,x',U) = max m(x') — m(x) (A.l) 

msU 

This formulation can be interpreted by assuming an adversary that can im¬ 
pose any DM utility function m in U and chooses the one that maximizes the 
regret of selecting conhguration x. The function u'^ = argmax R(x, x',U) 
is thus termed the “adversary’s utility” or “witness utility”. The maximum 
regret of choosing conhguration x with respect to the feasible utility set U is 
dehned as: 

MR(x, U) = max R(x, x',U) (A.2) 

x' 

Within the “adversary metaphor”, let us note that the x' chosen by the ad¬ 
versary for the specihc is the optimal decision under m"’ (i.e., x' maximizes 
m"’) and any alternative choice would give the adversary less utility and thus 
reduce the user regret. Finally, the minimax regret of the feasible utility set 
U is as follows: 

MMR(U) = min MR(x, U) (A.3) 

X 

and the conhguration x” = argmin MR(x, U) minimizing the maximum re¬ 
gret is the conhguration recommended to the DM by the minimax regret 
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decision criterion. The quality of configuration x'" is guaranteed to be no 
more than MMR(U) away from the quality of the DM favourite conhgura- 
tion, and no alternative conhguration has a better guarantee, i.e., for all 
X ^ x’', MR(x, U) > MMR(U). 

The initial bounds about the utility parameters dehned by the DM are not 
usually tight enough to identify conhgurations with provably low regret, and 
a conhguration satishcing the DM cannot be recommended without eliciting 
additional preference information. This is achieved through an interactive 
elicitation algorithm that asks queries to the DM and, based on the informa¬ 
tion elicited, rehnes the bounds and the constraints on the utility parameters. 
The generic framework of the approach is as follows: 

input: initial constraints (e.g., bounds) on the utility parameters defining 
the initial feasible set U 

compute minimax regret MMR(U); 

repeat until termination criterion 
ask query q; 

refine U by updating the constraints over utility parameters to reflect the 
response to q; 

recompute MMR(U) with respect to the refined set U; 
return to the DM the configuration x’' minimizing MR(x, U) 

Computationally tractable techniques have been proposed HEIE] to compute 
the minimax regret MMR (U). The iterative algorithm may be stopped by 
the DM when she is satished by the returned conhguration x^ or when the 
minimax regret MMR(U) reaches a certain level r. When the minimax regret 
is reduced to the value zero, the conhguration x^ returned by the algorithm is 
guaranteed to be the DM favourite conhguration. The minimax regret-based 
approach also enables a principled method to dehne informative queries that 
will be asked the DM (query selection), and diherent query strategies have 
been proposed [U [5l [6] . 

Appendix A. 1.1. Comparison with CLEO 

While the CLEO is a preference elicitation method approximately cor¬ 
rect with high probability, the minimax regret-based approaches assume an 
adversarial entity that acts to maximize the DM regret and they aim at beat¬ 
ing the adversary by recommending the best conhguration with respect to 
the worst case loss. However, this adversarial model is not always strongly 
motivated by real-world applications, where users are typically interested in 
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the actual obtained results rather than in regret. The main advantage of 
the regret-based approaches with respect to CLEO is the ability to provide a 
lower bound about the quality of the recommended conhguration and guaran¬ 
tee the convergence to provably-optimal results. However, these theoretical 
guarantees are valid under the assumption that the feasible set U contains 
the true DM utility function at any iteration of the elicitation process. That 
is, the regret-based methods do not consider the uncertain and inconsistent 
preference information characterizing the typical human decision processes. 
As a matter of fact, uncertain feedback from the DM translates into con¬ 
straints on the utility parameters that can potentially rule out the true util¬ 
ity from the feasible set U. Furthermore, the best performance observed in 
the experiments presented in the paper |1] is achieved by query strategies 
that include standard gamble queries, which require the users to state their 
preference over a probability distribution of conhgurations. These queries 
demand a higher DM cognitive load than the comparison queries adopted by 
CLEO, and thus in real-world applications they are more prone to errors and 
inconsistent answers from the users. Without suitable modihcations (e.g., 
constraints relaxation) to recover from the inevitable uncertain and inconsis¬ 
tent preference information elicited from the DM, regret-based approaches 
cannot be applied in the realistic problem settings and the noisy test cases 
that we consider in this work. 

Appendix A.2. Preference elicitation methods based on constraint satisfac¬ 
tion 

Recent work in the held of constraint programming [10] shares with CLEO 
the combinatorial approach to model user preferences. It dehnes the user 
preferences in terms of soft constraints and introduces constraint optimiza¬ 
tion problems where the DM preferences are not completely known before 
the solving process starts. Let us hrst briehy describe the c-semiring formal¬ 
ism [H] adopted in paper [10] to model soft constraints. 

In soft constraints, a generalization of hard constraints, each assignment 
to the variables of one constraint is associated with a preference value taken 
from a preference set. The preference value represents the level of desirabil¬ 
ity of the assignment to the variables of the constraint. As the preference 
score is associated to a partial assignment to the problem variables, it rep¬ 
resents a local preference value. The desirability of a complete assignment 
is dehned by a global preference score, computed by applying a combination 
operator to the local preference values. A set of soft constraints generates an 
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order (partial or total) over the complete assignments of the variables of the 
problem. Given two solutions of the problem, the preferred one is selected 
by computing their global preference levels. Soft constraints are represented 
by an algebraic structure, called c-semiring (where letter “c” stays for “con¬ 
straint”), providing two operations for combining (x) and comparing (-|-) 
preference values. In detail, the c-semiring is a tuple {A, -1-, x, 0 , 1) where: 

• A is a set and 0 , 1 G A; 

• -f is commutative, associative and idempotent; 0 is its unit element 
and 1 is its absorbing element; 

• X is commutative, associative, distributes over 1 is its unit element 
and 0 is its absorbing element. 

Let us note that a c-semiring is a semiring with additional properties for the 
two operations: the operation -|- must be idempotent and with 1 as absorbing 
element, the operation x must be commutative. The relation over A, 
0‘2 iff 02 + Oi = Oi, is a partial order, with 0 and 1 its minimum and 

maximum elements, respectively. The relation allows to compare (some 
of) the desirability levels, with 02 <a meaning that Oi is “better” than 02 ; 
0 and 1 represent the worst and the best preference levels, respectively, and 
the operations -|- and x are monotone on <^. Consider, e.g., the following 
instance of c-semiring: 

({5,10,15,..., 50}, max, min, 5, 50) 

with preference values from the set {5,10,15,..., 50} and elements 0 and 
1 represented by the values 5 and 50, respectively. The desirability of a 
complete assignment is obtained by taking its minimum local preference 
value. A complete assignment ci with preference score oi is preferred to 
a complete assignment C 2 with lower preference score 02 . That is, 02 
Oi iff max(a2, Oi) = fli. 

The generality of the semiring-based soft constraint formalism permits 
to express several kinds of preferences, including partially ordered ones. For 
example, different instances of c-semirings encode weighted or probabilistic 
soft constraint satisfaction problems |12] . However, the c-semiring formalism 
can model just negative preferences. First, the best element in the ordering 
induced by <^, denoted by 1, behaves as indifference, since Va G A, 1 x 
a = a. This result is consistent with intuition: when using only negative 


45 


preferences, indifference is the best level of desirability that can be expressed. 
Furthermore, the combination of desirability levels returns a lower overall 
preference, since a x b <a cl, b, again consistently with the fact of dealing 
with negative preferences. 

Preference elicitation strategies have been introduced ra within this for¬ 
malism in order to deal with scenarios where preference value information 
is partially unknown. Some of the local preference values attached to soft 
constraints are assumed to be missing, and the DM is asked for an explicit 
feedback on specihc assignments for these constraints, in terms of score values 
quantifying her preference for a certain assignment. The elicitation strategy 
is aimed at minimizing the number of queries to the DM. 


Appendix A.2.1. Comparison with CLEO 

Concerning expressivity of the representation formalisms, the work in 
shows how to encode semiring-based soft constraint satisfaction problem 
(SCSP) instances into equivalent weighted MAX-SAT formulations. Each 
solution of the latter instance corresponds to a solution of the former one. 
Details on the encoding algorithm can be found in [Appendix A. 2.^ The 


rationale for the MAX-SAT encoding is the exploitation of the efficient and 
widely studied techniques implemented in modern SAT solvers, which can 
efficiently handle large-size structured problems [H]. The encoding can in 
principle be applied also to SCSPs with continuous decision variables or dis¬ 
crete variables dehned over large size hnite domains, possibly however at 
the cost of a signihcant blow-up in the translation. In this case, one may 
cast the SCSP instance into a weighted MAX-SMT rather than a weighted 
MAX-SAT formulation. 

Concerning the preference elicitation setting, our formulation assumes a 
much more limited amount of initial knowledge about the problem to be 
optimized. In the work on preference elicitation for SCSPs [10], decision 
variables, soft constraint topology and structure are assumed to be known in 
advance and the incomplete initial information consists only of missing local 
preference values. CLEO assumes complete ignorance about the structure 
of the constraints over the decisional variables of the user. The initial prob¬ 
lem knowledge is limited to a set of catalog attributes. CLEO extracts the 
decisional items of the DM from the set of catalog attributes and learns the 
weighted constraints constructed from them modeling the DM preferences. 
If the MAX-SAT encoding is applied to the SCSP with missing preferences, 
it produces a Boolean formula where some of the weights of the terms are 
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not known. On the other hand, CLEO handles MAX-SAT instances where 
both the constraints and their associated weights are initially unknown and 
are learnt by interacting with the DM. 

Furthermore, the technique in [10] is based on local elicitation queries, 
with the hnal user asked to reveal her preferences about assignments for 
specihc soft constraints. Global preferences or bounds for global preferences 
associated to complete solutions of the problem are derived from the local 
preference information. CLEO goes in the opposite direction: it asks the 
user to compare complete solutions and learns local utilities (i.e., the weights 
of the constraints of the logic formula) from global preference values. In 
many cases, recognizing appealing or unsatisfactory global solutions may be 
much easier than dehning local utility functions, associated to partial so¬ 
lutions. For example, while scheduling a set of activities, the evaluation of 
complete schedules may be more affordable than assessing how specihc order¬ 
ing choices between couples of activities contribute to the global preference 
value. Furthermore the preference elicitation technique in [TU] asks the DM 
for quantitative evaluations of partial solutions: she does not just rank cou¬ 
ples of activities, she provides score values quantifying her preference for the 
partial activity rankings, a much more demanding task. 

In order to reduce the embarrassment of the decision maker when speci¬ 
fying precise preference scores, interval-valued constraints [H] allow users to 
state an interval of utility values for each instantiation of the variables of a 
constraint. As a matter of fact, the informal dehnitions of degrees of prefer¬ 
ence such as “quite high”, “more or less”, “low” or “undesirable” cannot be 
naturally mapped to precise preference scores. However, the technique de¬ 
scribed in jin] requires the user to provide all the information she has about 
the problem (in terms of preference intervals) before the solving phase, with¬ 
out seeing any optimization result. 

Even if interval-valued constraints 051 have been introduced to handle 
uncertainty in the evaluations of the DM, inconsistent preference information 
is not addressed IIDI. This is a requirement to retain the optimality guaran¬ 
tees provided by the preference elicitation strategy. Conversely, CLEO trades 
optimality for robustness and can effectively deal with imprecise information 
from the DM, modelled in terms of inaccurate ranking of the candidate so¬ 
lutions. 

Finally, while the work in |T0] considers unipolar preference problems, 
modeling just negative preferences, CLEO naturally accounts for bipolar 
preference problems, with the hnal user specifying what she likes and what 


47 


she dislikes. Bipolar preference problems provide a better representation of 
the typical human decision process, where the degree of preference for a so¬ 
lution reflects the compensation value obtained by comparing its advantages 
with the disadvantages. Let us note that the work in |32] extends the soft 
constraint formalism to account for bipolar preference problems. 


Appendix A.2.2. Econding SCSP into weighted MAX-SAT instances 

The work in |33] introduces a method to encode a semiring-based soft 
constraint satisfaction problem (SCSP) instance into a weighted MAX-SAT 
instance, with each solution of the generated MAX-SAT instance correspond¬ 
ing to a solution of the original SCSP. With no loss of generality, assume a soft 
constraint problem with n variables vi,.. .Vn having domain Di,... Dn, and 
m constraints ci,... c^- Each instantiation of the variables of a constraint 
Cj, j = 1.. .m, is associated with a value from the c-semiring (A, -f, x, 0 , 1 ). 
For each variable Vi, i = 1.. .n, and each value d E Di, & Boolean variable 
bi^d is introduced. When bi^d is set to true then Vi is assigned the value d E Di. 
The variables bi^d, i = 1.. .n, d E Di, represent the Boolean variables of the 
weighted MAX-SAT problem. 

The set of Boolean constraints of the MAX-SAT problem consists of 
clauses ensuring that each variable n*, i = 1... n, is assigned exactly one 
value d E Di^ and of terms representing the soft constraints of the original 
SCSP. In the former case, for each variable Vi,i = 1... n, the at-least-one- 
value hard clause: 

ipi^di V bj^ d2 V ■ ■ • V 

and the set of {\Di\{Di\ — l))/2 binary at-max-one-value hard clauses: 


V for every pair (d^, 4) with 4 , 4 G A and 1 < j < fc < | A| 


are generated. They ensure that for each i E {1.. .n} exactly one variable 
bij, j E {1,2,, \Di\} is set to true. 

Each soft constraint of the original SCSP is represented by a set of 
weighted Boolean terms encoding all the possible assignments of values (i.e., 
conhgurations) to its variables. The weight of a term is set to the c-semiring 
value associated to the encoded conhguration. For example, consider a bi¬ 
nary soft constraint over variables vi and V 2 both with discrete domain D = 
{1, 2, 3} and with preference scores dehned by the semiring ({5,10,15,..., 50}, 


max, min, 5,50). The possible conhgurations are specihed in Table A.5 (left) 


Each row shows an assignment of values to vi and V 2 and the c-semiring value 
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V 2 

preference value 

num 

term 

weight 

1 

1 

10 

1 

(^1,1 A 62,1) 

10 

1 

2 

40 

2 

(&i,i A 62,2) 

40 

1 

3 

50 

3 

(&i,i A 62,3) 

50 

2 

1 

5 

4 

(&1,2 A 62,1) 

5 

2 

2 

10 

5 

(^1,2 A 62,2) 

10 

2 

3 

30 

6 

(&1,2 A 62,3) 

30 

3 

1 

5 

7 

( 5 i ,3 a 62,1) 

5 

3 

2 

5 

8 

( 5 i ,3 a 62,2) 

5 

3 

3 

10 

9 

( 5 i ,3 a 62,3) 

10 


Table A. 5 : (left) example of soft constraint. The DM prefers assignments with vi < V2- 
(Right) weighted Boolean terms encoding the soft constraint defined in the left table. 
When the Boolean variable bi^d '■ d G Di, is set to true then Vi is assigned the value d. 


associated to the assignment. Given the six Boolean variables and 62,d 
with d = 1,2,3 dehned as above, the soft constraint in Table A.5 (left) is 
encoded into the set of Boolean terms in Table A.5 (right). 

A structured MAX-SAT formulation can be obtained by considering gen¬ 
eralized Boolean clauses which are the disjunction of the terms encoding for 
a given soft constraint the assignments with the same preference value. For 
example, the terms dehned at rows number 1,5,9 in Table A.5 (right) can 
be merged into a single generalized weighted clause: 


( 5 i,i A 62,1) V (6p2 A 62,2) V (61,3 A 62,3) 

with weight equal to 10. Furthermore, each at-least-one-value and at-max- 
one-value hard clause h can be cast into a soft clause represented by its 
negation -ih and with associated the semiring value 0 |13]. The value 0 is 
indeed both the minimum value in the partial order dehned by the relation 
and the absorbing element for the operator x combining the semiring values. 
Therefore, a candidate solution b of the generated MAX-SAT instance that 
does not satisfy one of these soft clauses receives the minimum semiring 
value 0 . However, this implementation of the hard clauses does not allow to 
discern infeasible solutions from feasible ones with lowest possible preference, 
i.e., feasible solutions getting the lowest semiring value. 

Given the generated MAX-SAT formulation, the optimization task con¬ 
sists of hnding the assignment b* to the Boolean variables bi^d, i = 1... n. 
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d & Di, maximizing /(b), with /(b) the semiring value obtained by combin¬ 
ing by the operator x the weights of the solution components satished by b. 
Each candidate solution (b, /(b)) of the generated MAX-SAT instance iden- 
tihes an assignment of values to the variables Vi, i = 1... n, of the original 
SCSP with associated semiring value /(b). 
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